首页> 外文OA文献 >Monaural Multi-Talker Speech Recognition using Factorial Speech Processing Models
【2h】

Monaural Multi-Talker Speech Recognition using Factorial Speech Processing Models

机译:使用阶乘语音的单声道多语音语音识别   处理模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A Pascal challenge entitled monaural multi-talker speech recognition wasdeveloped, targeting the problem of robust automatic speech recognition againstspeech like noises which significantly degrades the performance of automaticspeech recognition systems. In this challenge, two competing speakers say asimple command simultaneously and the objective is to recognize speech of thetarget speaker. Surprisingly during the challenge, a team from IBM research,could achieve a performance better than human listeners on this task. Theproposed method of the IBM team, consist of an intermediate speech separationand then a single-talker speech recognition. This paper reconsiders the task ofthis challenge based on gain adapted factorial speech processing models. Itdevelops a joint-token passing algorithm for direct utterance decoding of bothtarget and masker speakers, simultaneously. Comparing it to the challengewinner, it uses maximum uncertainty during the decoding which cannot be used inthe past two-phased method. It provides detailed derivation of inference onthese models based on general inference procedures of probabilistic graphicalmodels. As another improvement, it uses deep neural networks for joint-speakeridentification and gain estimation which makes these two steps easier thanbefore producing competitive results for these steps. The proposed method ofthis work outperforms past super-human results and even the results wereachieved recently by Microsoft research, using deep neural networks. Itachieved 5.5% absolute task performance improvement compared to the firstsuper-human system and 2.7% absolute task performance improvement compared toits recent competitor.
机译:开发了一种名为单声道多说话者语音识别的Pascal挑战,针对鲁棒的自动语音识别对付诸如语音之类的语音的问题,该问题大大降低了自动语音识别系统的性能。在此挑战中,两个相互竞争的发言者同时说出简单命令,目的是识别目标发言者的语音。令人惊讶的是,在挑战期间,IBM研究团队在此任务上的表现可能比听众更好。 IBM团队提出的方法包括中间语音分离和单个讲话者语音识别。本文基于增益自适应析因语音处理模型,重新考虑了这一挑战的任务。它开发了一种联合令牌传递算法,可同时对目标和掩蔽说话者进行直接话语解码。与挑战者相比,它在解码过程中使用了最大的不确定性,这在过去的两阶段方法中是无法使用的。它基于概率图形模型的一般推理过程,提供了对这些模型的推理的详细推导。作为另一项改进,它使用深度神经网络进行联合说话人识别和增益估计,这使得这两个步骤比在为这些步骤产生竞争性结果之前更加容易。这项工作的拟议方法优于过去的超人结果,甚至是最近使用深度神经网络由Microsoft研究获得的结果。与第一个超人系统相比,它的绝对任务性能提高了5.5%,与最近的竞争对手相比,它的绝对任务性能提高了2.7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号